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INTRODUCTION TO THIS SPECIAL ISSUE 


“FAIR enough”... A question asked on a daily basis in the rapidly evolving field of open science and 
the underpinning data stewardship profession. After the publication of the FAIR principles in 2016, they 
have sparked theoretical debates, but some communities have already begun to implement FAIR-guided 
data and services. No-one really argues against the idea that data, as well as the accompanying workflows 
and services should be findable, accessible under well-defined conditions, interoperable without data 
munging, and thus optimally reusable. Being FAIR is not a goal in itself; FAIR Data and Services are needed 
to enable data intensive research and innovation and (thus) have to be “Al-ready®” (= future proof for 
machines to optimally assist us). However, the fact that science and innovation becomes increasingly 
“machine-assisted” and hence the central role of machines, is still overlooked in some cases when people 
claim to implement FAIR. 


FAIR is a community journey 


The FAIR guiding principles are exactly that; a guide to create reusable research objects. FAIR is not a 
new standard; is not a top down requirement; is not an all-or-nothing binary state (FAIR or not FAIR). The 
FAIR principles were conceived and designed as a resource for optimal choices to be made during many 
aspects of data and tool generation as well as (re)use and long term stewardship. They serve the purpose 
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of guiding implementation considerations on our journey to machine-assisted research and innovation. 
Therefore this special issue of the Data Intelligence journal is dedicated to practical first generation 
implementation choices that are being made by communities of practice, relevant for the FAIR status of 
data and accompanying tooling. This issue also features “opinion articles” that address challenges 
encountered or anticipated along the implementation trajectory of FAIR for which there are no ready-to- 
re-use solutions. These early examples and visionary discussions may inspire the further development of 
interoperable approaches by various early mover communities, who are convinced that both data and 
services (including tooling) should be FAIR to enable the envisioned Internet of FAIR Data and Services 
(IFDS) [1]. This is all “early stage”, because the FAIR principles were published in their current form only 
in March 2016 [2]. The FAIR principles did not mark a major new insight, but were rather a consolidation 
and a comprehensive rephrasing of a series of earlier foundational and pioneering approaches (some 
decades in the making) to move toward a machine-friendly research infrastructure. Several papers in this 
issue make this point in historical context [3,4,5]. 


FAIR “hit a chord” 


Why then is it that the FAIR acronym so rapidly sparked the discussions at the domain sciences level, 
got wide recognition at the policy level [6], among funders [7], repositories [8] and not only in Europe and 
the USA, but also in Africa [9], Latin America [10], and China [11,12]? There may be two important 
differences with earlier efforts. First, after the inception of the FAIR principles in January 2014 at the Lorentz 
workshop in Leiden, The Netherlands®, the principles where initially posted at the FORCE11 website for 
community comments® and subsequently published in Scientific Data in March 2016 [2]. The article 
apparently “hit a chord” and was massively read, tweeted, discussed in blogs and cited [see box below]. 


At the press date of this Data Intelligence special issue on emerging FAIR practices, the commentary 
had already 67,000 article accesses, over 1600 citations in Google Scholar, altmetrics score of 1,385 and 
was ranked 1* of the articles of a similar age in Scientific Data and consistently scored in the 99"percentile 
(ranked 74") of the 264,581 tracked articles of a similar age in all journals. The article was tweeted 1,243 
times, appeared in 77 blogs and picked up by 84 news outlets. After three years, the rate of citation 


continues to increase and is currently almost two citations per day in Google Scholar. 


In addition to a sticky acronym, we strongly believe that the inception of the FAIR principles at the 
Lorentz workshop in Leiden (January 2014) marked a natural tipping point, also caused by the very visible 
discussions in the scope of the European Open Science Cloud preparations [1], which again lead to an 
“attraction phase” toward a common approach as described for many other major infrastructures [13]. 


®  https:/Awww.universiteitleiden.nl/en/news/201 6/03/the-fair-principles-herald-more-open-transparent-and-reusable-scientific- 
data. 
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FAIR is already implemented in communities of practice 


In a recent student-led survey, referenced in this issue [14], it was shown that 80% of the papers citing 
the original FAIR paper, actually deal with practical implementations. This indicates that, next to the 
“political hype” caused by the acronym, a growing number of organizations and communities have actually 
attempted to forge technical implementation choices that adhere to the FAIR guiding principles. Some of 
them are described in this issue, but there are many more that can be found in the literature®. The 
implementations are apparently emerging in the entire spectrum of science and innovation domains and 
include life sciences (notably biomedicine and health, biodiversity and agriculture), nuclear energy, climate 
change, ocean research, humanities, economics, space science and mineralogy and many more. Furthermore, 
data science related implementations are also numerous, such as ontology mapping, machine learning 
algorithms, ontology-based access protocols, automation technology, annotation and curation, and many 
aspects practiced in the emerging profession of data stewardship in data competence centers in institutions 
all over the world [15]. Unfortunately, 80% of the citations so far are from Europe and USA [14]. It is very 
encouraging though, that this issue reports on strong FAIR related activities, not only in Europe [5] and USA 
[16], but also in Africa [9] and Latin America [10]. Moreover, international organizations such as Research 
Data Alliance (RDA) [5], The Committee on Data for Science and Technology (CODATA) [17,18], European 
Strategy Forum on Research Infrastructures (ESFRI) [4] and scientific unions such as AGU [15] and IUPAC 
[18] are leading their domain communities toward more mature FAIR choices and infrastructures. Ever since 
the publication of the FAIR principles, they have sparked converging initiatives such as GO FAIR® and 
strong collaborations between GO FAIR and other international data related initiatives including the 
Commission on Data of the International Science Council (ISC’s CODATA®), it’s World Data System®, and 
the Research Data Alliance (RDA)®. Based on all the above it is likely that well over 1,000 communities 
of practice already work on some implementation aspects guided by the FAIR principles (i.e., 80% of > 
1,600 written articles citing the FAIR principles paper [2]). 


“My Machine knows what I mean” 


So, what does “being FAIR enough” actually mean? First of all, this will vary widely for different 
communities and domains and will be ultimately decided by the communities of practice that adopt 
policies supporting machine-actionable data, that aim to “de-silo” and that strive to overcome disciplinary 
boundaries. 


Still, in this very early implementation phase there has also been quite some confusion and anxiety about 
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what the FAIR principles actually cover. As a result several “additional acronym letters” have been proposed, 


even in some early draft articles for this issue. So far, all of these proposed changes could be resolved 


https://scholar.google.com/scholar?oi=bibs&hl=en&cites=5577767787428752324,13506161023122668610. 
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http://www.codata.org/. 
https://www.icsu-wds.org/. 
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without changing the powerful acronym, because they could either be classified as a specific “implementation 
choice” [19] or because they were “beyond FAIR” [20]® since they addressed issues that, “by design”, the 
FAIR principles do not cover, such as ethics, privacy, reproducibility, data or software quality per se. Many 
of these very important aspects are implicitly related to findability of software and data, their accessibility, 
interoperability and therefore the ability to reuse these research objects, but they should not be conflated 
with the FAIR principles themselves. These were designed to strictly cover the inherent machine-FAIRness 
of data and services. In that sense, even the fabrication of fake data, making them FAIR and publishing 
them in a Core Trust Seal Repository® would not violate the principles, especially when the metadata 
indicate that the data are fabricated, for example a machine learning (ML) training-set. Conversely, putting 
high quality data in a mediocre repository can not be prevented by the principles as such, although 
obviously, when the only repository in which data or code was published is offline or not findable for other 
reasons, the FAIR principles are not properly followed. Abuse of the FAIR acronym is related to specific, 
and stakeholder defined, implementations, and some of them tangentially addressed by the FAIR principles 
and others not. 


In this issue [19], the original conception of the FAIR principles and what they are intended to cover is 
explained in detail. In an attempt to narrow down to the essence of what the original composers of the 
FAIR guiding principles had in mind, we would like to introduce an even higher level of abstraction than 
the principles themselves: the trigger for so much international attention for better data stewardship and 
Open Science is likely correlated to the data explosion we have created through ever increasing automation 
and instrumentation advances. It follows that we need “machines”, both as creators of data and as analytical 
assistants, all the time: we better make them as efficient and collaborative as possible. So at their very core, 
the FAIR guiding principles should lead us to ensure that “Machines know what it means”. Obviously, this 
does not (yet) take people out of the loop. In fact the envisioned Internet of FAIR Data and Services [1] 
should be an environment where our implementation choices support both machines and humans, in a 
tight and iterative collaboration (i.e., “Social Machines” [21] are the end users). 


FAIR Data are the substrate for FAIR tools and services for Open Science 


Open Science is in fact a new way of doing and communicating science with an emphasis on reusability 
of data and the accompanying analytics, not only by other researchers, but ostensibly also by machines. 
Hence the one-liner that captures the essence of the FAIR principles is “Machines know what it means”. 
So, do we trivialize the role of humans in science and innovation? On the contrary; publishing our major 
research output (data, software tools, derived information and major scientific conclusions and claims) in 
FAIR format, will enable computers to “also” output the relevant information in precise, human-digestible 
formats, meanwhile mitigating ambiguity introduced by natural language, and effectively even crossing 
jargon, ontological false agreements and false disagreements [22], and ultimately even natural language 
barriers. It should also be emphasized that “data” should always be published with supplementary narrative 


® https:/Avww.rd-alliance.org/groups/fair-data-maturity-model-wg. 
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for humans to judge and evaluate the data and information we provide in machine readable and actionable 
formats [23]. 


Human prosaic narrative, graphical figures and tables and most supplementary data, in the formats we 
have used for scholarly communication for centuries, are “a nightmare for machines” and therefore 
intrinsically, in their native form, do not comply with FAIR principles, which obviously does not make them 
useless for human reuse. The good news is that precise scientific claims in legacy text as well as the 
supporting data can be transformed into FAIR formats with increasing relative ease. Human readable text, 
tables and figures for human intellectual consumption can also increasingly be produced by machines, for 
instance from relational databases to RDF and vice versa [24]. Supporting both machines and humans in 
their collaborative work is therefore the major contribution the FAIR guiding principles are supposed to 
make to 21st century research an innovation throughout the world. An important notion is also that research 
objects that are not “digital” or otherwise are not machine interpretable, such as geological and biological 
specimens, analogue pictures, PDFs and the like, can nonetheless always be adorned with FAIR metadata 
creating a “digital twin” [3, 5]. In this context, beyond the original coining of the term by Michael Grieves 
in 2002 [25], we see a “digital twin” of a non-digital research object as a set of machine-readable metadata 
and instructions that allow machines to detect and resolve to the location of the object via its unique 
identifier [26] make the best possible interpretation of what the object is, what operations on it are technically 
possible and what is allowed to be done with the digital objects in the twin. The actual research object 
can be anything from a molecule to a packaged data and workflow object [27] to one of the 3 billion 
biological specimens in natural history museums [3], to citizens in the FAIR driven research and care 
environments such as the Personal Health Train [28]. FAIR digital objects, sometimes pointing to other 
digital objects and sometimes to objects in the physical world, thus form the basic substrate for machine- 
assisted science and innovation. 


Pioneering choices in FAIR implementations 


In the first series of 15 articles in this special issue we have bundled a relevant set of “first generation” 
implementations and emerging practices in the context of FAIR. These are followed by 12 articles that focus 
more on gaps in existing technology and practice encountered or envisioned and offer opinion and propose 
directional solutions for the relevant communities to develop FAIR guided approaches. The Implementation 
Articles cover the overall protocols and operations to enable efficient handling of FAIR Digital Objects in 
the Internet and Web environments. The very first requirement being that each FAIR Digital Object has its 
own Unique, Persistent, Resolvable Identifier [26]. Machines subsequently need instructions and workflows, 
and these can and should be FAIR themselves to effectively participate in the Internet of FAIR Data and 
Services [27]. Next to that, data, even when they can not be Open Access without restrictions and as 
“closed off” as necessary, can nevertheless be FAIR [29]. For all research objects, restricted in use or not, 
there is a need for machines to independently access the data and “understand” what kind of (machine 
and human) operations are possible and allowed [30]. The actual reuse of the data is subsequently always 
subject to a user license, however liberal [31]. This is for instance critical for industrial use of data, as data 
and other research objects that have not been properly licensed is viewed as having uncertain legal liability, 
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and thus cannot be easily reused by industry [32]. A very important aspect of the wide acceptance of FAIR 
data as a first class research output is that data are properly (indeed, automatically) cited upon reuse. 
Technologies to make effective and scalable data citation possible are in their early stages, but they will 
soon be well established [33]. A number of pioneering domain specific implementation efforts and choices 
to help make data and metadata FAIR are also emerging [34]. In addition, tools that enable planning of 
FAIR compliant metadata files and data management and stewardship plans are being developed and tested 
[35], and increasingly these interact with tools that expose FAIR standards in dedicated FAIR repositories 
to stimulate reuse [36]. As stated earlier, FAIR data alone remain a lame substrate, unless there are FAIR 
data consuming workflows, that in the Internet of FAIR Data and Services should be developed according 
to FAIR principles themselves, which poses a whole additional set of choices and challenges [37]. The 
“implementation section” is completed by a set of articles that describe how the various choices impact 
their own and potentially other disciplines, such as sensitive personal health data [28], data describing 
physical objects rather than digital objects, such as in biodiversity collections, biobanks and geosciences 
[3], and a massive cross-cutting domain such as chemistry [18]. In all these areas there are different, but 
also overlapping legal aspects associated with the reuse of data and workflows that should be addressed 
when publishing data and code for reuse [38]. Once all these decisions have been made, a final, and very 
important decision for FAIR-oriented researchers and data stewards is actually “Where to publish and 
archive my data with maximum chance of proper reuse’? This means that also data repositories should 
consider the aspects of FAIR metadata and the data collections themselves and how they can support long 
term reuse and preservation [35]. Finally it is very important to develop tooling that, as objectively as 
possible, can measure the maturity of FAIRness of digital resources [19,20 & footnotes 6-9] clearly 
demonstrating that FAIR is not a “binary” status, but an aspiration to move from scholarly communication 
assets that are “re-useless” for machines toward increasingly machine-actionable elements of an emerging 
Internet of FAIR Data and Services [33]. 


Finally, we realized early on that FAIR should not, and will not be, exclusively an academic exercise. 
As happened in the Internet as we know it, sooner rather than later, private institutions and industry will 
(and should) join and much of what we will see happening on the route to the envisioned Internet of FAIR 
Data and Services will be done in public-private partnerships and in some cases scaled by large industries. 
It is important that at such an early stage, many industrial partners are already highly interested and 
contributing their views on what their needs are [32]. A balanced development of a professional (indeed, 
commercial) backbone on which both academic and industrial applications can run, will be as important 
for the Internet of FAIR Data and Services as it is for the current Internet. Moreover, given the commitments 
to Open Research and the “long-tail” of technically-rich disciplines participating in the Internet of FAIR 
Data and Services, the principles of net neutrality and open standards will necessarily feature prominently 
and irrevocably regardless of the emerging business models. 


Outlook 


It will be obvious after reading this special issue that FAIR compliant data stewardship will require many 
different skills that are not traditionally covered by the research curricula of contemporary students and 
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researchers. Therefore, extensive training capacity and training materials are needed, and in need of 
development. Some academic tools under development have been described in [39], and also commercial 
training options and in-company FAIR competence centers are being developed [32]. Public research 
funders [7] as well as data driven private endeavors will increasingly call for proper (and funded) data 
stewardship plans, not only for research outputs, but also in the data-intensive processes for product 
approval, legislation and certification [32]. International agreements will be needed on good practices that 
can form the basis for better, FAIR and Open Science as well as for well documented innovation and 
production processes. The envisioned Internet of FAIR Data and Services will form a backbone for this 
future societal innovation and may be of very high impact on human wellbeing and the responsible 
stewardship of our planet. 


Thus, this special issue is entirely based on the concept of reusable FAIR digital objects that effectively 
form “one computer with one data set” as suggested by George Strawn in this issue, and which is distributed 
over the planet but functionally interconnected and interoperable by FAIR principles, for fair and equal use. 
This being said, we are fully aware of the limited and lumpy scope of the articles we were able to collect 
for this issue. Looking forward, the publisher and the co-editors therefore encourage from the community- 
at-large, additional collections of practical Implementation Choices and recognized Challenges as 
contributions to what we envision will be a recurring special issue. 
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